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1.  Introduction 

^ Let  be  independent,  identically  distributed  random  variables  with  a 

A*'-* 

continuous  but  unknown  distribution  function  F.  JQenote  the  empirical  distribution 
function  for  sample  Xi,X2,  •  •  • ,  Xn  by  l’(x)  =  -j{|  of  X,-  <  x,  *  =  1,  -  vr^.  In  testing 
goodness  of  fit,  that  is,  we  want  to  test  F  for  some  specific  choice  of  /o,  the 
commonly  used  test  statistics  are • 

D$  =  y/na\ip(P„(z)  -  F{x)) 

D~  =  v/ninf^C*)  -  F(z)) 

Dn  =  y/naup  | F(z)  -  F(x)|.  ^  v  ) 


The  purpose  of  this  paper  is  to  give  a  bound  for  the  tail  probability  of  D~  in  the 
following  form. 

Theorem  1.  p{D~  >  y/n(}  <  Zy/ie'2"**. 

A  bound  of  the  form  p{D~  >  \/»f}  <  C  e"2nfI,  where  C  is  some  unspecified 
constant,  has  been  proven  by  Dvoretsky,  Kiefer,  and  Wolfowits  (1956).  There  are 
several  papers  conjecturing  that  C  can  be  taken  as  1,  cf.  Bimbaum  and  McCarty 
(1958)  and  Csorgd  and  Horvith  (1981).  Each  of  them  is  substantiated  by  considerable 
numerical  computation,  although  no  proof  is  available.  Devroye  and  Wise  (1979) 
proved  C  <  {2 +  32/(6*-)*  +8/3*  +2*  4  exp  ([,)}  <  306,  but  this  bound  is  too 
large  to  be  useful  in  any  application.  The  best  result  known  to  the  author  (before 
this  paper  was  written)  is  c  <  29,  due  to  G.  Shorack  (private  communication),  so  the 
result  of  this  paper  is  a  substantial  improvement  of  all  the  results  known  so  far  and 
partial  support  of  the  conjecture 


2.  Proof  of  the  Main  Result 

First  we  introduce  some  notation  and  basic  facts  about  exponential  families.  As¬ 
sume  the  distribution  function  F  of  Xx  can  be  imbedded  in  an  exponential  fam¬ 
ily,  i.e.  for  all  9  in  some  neighborhood  of  0  exp[^(0)]  =  fexp(fiz)  F(dz)  is  finite, 
so  exp[0z  -  ^(0)j  F(dx)  defines  a  family  of  probability  distributions  indexed  by  9. 


It  is  easy  to  show  that  the  mean  and  variance  of  these  distributions  are  given  by 
V>'(0)  and  i>"[9)  respectively.  Hence  n  =  i>'{9)  is  a  one  to  one  function  of  9.  It 
will  be  convenient  to  regard  this  family  of  distributions  as  indexed  by  ft  and  write 
Fp(dx)  =  exp[0z  -  i>(9)\  F(dx).  Let  denote  the  probability  according  to  which 
XltX2--  are  independent  with  P0(Xi  e  dx)  =  FM(dx)  (t  =  1,2,-).  The  den¬ 
sity  of  S„  =  Xi  +  ■  •  •  +  Xn  under  P*  will  be  denoted  by  /M.  If  A  is  an  event 
belonging  to  the  a  -field  generated  by  Xu-- •  Xmi  the  following  notation  will  be 
used:  P{<m,(i4)  =  Pp(AfSm  =  m{).  In  this  paper  we  consider  only  events  of  the 
form  A  =  {r  <  k),  (k  =  1, 2, ••  •  m),  where  r  is  a  stopping  time. 

Siegmund  (1982)  derived  the  following  fundamental  identity 

pW(rch)  =  exp{-m((0a  -  0O)  Mo  +  1>{0o)  -  (1) 

/  M  /*.,  -  Sr)  exp(-(^  -  02)  5r]//„0,  m(mMo)dP^l 

The  notation  t  =  0, 1,2  is  used  above,  and  6U  92  satisfy  =  1>{92). 

Let  us  bring  our  attention  back  to  Z>~.  It  is  well  known  that  the  distribution 
of  D~  is  the  same  for  all  continuous  distributions,  so  without  loss  of  generality  we 
may  take  F  to  be  the  uniform  distribution  on  (0, 1).  The  well  known  representation  of 
uniform  order  statistics  in  terms  of  sums  of  independent  exponential  random  variables 
shows  that 

P{D~  >\/»f}  =  P{  sup  (*  -  #■„(*)  >  f } 

0<s<l 

=  P{mmc (Wy  - ;)  >  nf  -  llW^+i  -  (n  +  1)  =  -1} 

=  Pir'{r<m) 

where  Wj  =  Yi  -I - (-  Yj  and  Yi,Y2,--  are  independent  standard  exponential,  m  = 

n  +  l>  r  =  inf{»  :  W{  -  i  >  nf  -  1). 

For  reasons  which  will  be  indicated  later,  we  divide  the  set  {r  <  m)  into  two  parts 
{r>f  +  l}  U{j  +  1<*<»»»}  and  apply  a  time  reversal  argument  to  the  later  part, 
i.e. 

<  j  +  l)  +  ft»(r<=) 


2 


where  v9  =  T  =  inf{t  :  5,  >  »f}  and  under  the  probability  P  Si  has  the  same 
distribution  as  *  —  W,-  (»  =  1,  •  •  • ,  n  + 1).  By  (1)  we  have 

Pfr]{r  ^  \  =  exp{-m((02  -  0O)  A*o  +  lK*o)  -  tK*s)l)  (2) 

•  l  ,  v  /M,  m-f  ("»A*o  -  5f)  exp[-(^  -  02)  £•]//*„.  „(mpo)  dP„ 


and 

PW  (r  <  5)  =  exp{-m[(A8-Ao)*/o  +  ^(Ao)-^(A,)]>  (3) 

•  f  fe,  m-T(mvo  -  Sr)  «P[-(Ai  -  Aj)  SrUg^mimuo)  dPVi , 

/(T<f) 

where 

W)  =  -9  -  log(l  -  9), 

^(A)  =  A  —  log(l  +  A), 

W)  = 

^(A)  =  ^(A)  = 

/^(*)  =  |^^j(*  +  *)‘"1«*p[(-* +  *)(!“  01,  *£-*»  -oo<P<l, 
y*>(y)  =  “  s')*'1  “pK1  +  *)(y  “  *)!>  y  <  *,  -i  <  ^  <  oo, 

tf2  <  0  <  $i  <  1  satisfy  i>{9a)  =  ^(fi»),  and  -1  <  A2  <  0  <  Aj  satisfy  ^(Aa)  =  ^(Ax). 

We  work  with  (2)  first.  Under  P*,  the  increment  of  the  random  walk  5,-  has  an 
exponential  right  tail.  The  following  Lemma  is  a  direct  consequence.  The  proof  ‘is 
omitted. 

Lemma  1.  Under  PPl  Rm  —  S,  -  (nf  - 1)  is  independent  of  r  and  has  an  exponential 
distribution  with  parameter  (l  -  Pi). 
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By  Lemma  1 


-  5f)  exp[— (0i  -  02)  ST\dP,J  f^mno) 


i*+»i  , 

=  £  I.  ..U,M-k{-ni-Rm)«xp[-{e1-e2)Rm]dPltl 

k=i  '<'=*) 

•  exp(-(0i  -  02)(nf  ~  1)l//no,i»(”*/*o) 


m—k—n( 


f„t,m-k{-ns  -  *)exp[-*(l  -  02)]dx/ fn^mno) 


l?+»l 

=  exp(— (0i  -  02)»f]  £  P„,{r  =  *)/„, m-*+i(-nf  -  l)//^o,m(M0m). 
*=1 


Observe  that  /„„*-*+!  (x)  is  maximized  at  x  =  1  and  the  maximized  value 

(1  -  02)(m  -  k)m-* 

(m  -  *)! 

and 

r  f  \  m"*  e~m 

(m  _  l)[(m  _  ijjj* 

Substituting  these  results  into  the  expression  above,  we  have  an  upper  bound  of  the 
form 


fl  tf  \  c-rnl  ftf  ffw/V'V  Ir-  Jfc^m  “*)"*"*  (m  —  lM(m  —  1)1] 

(1  —  0j) expj— (0i  - 0j)nf]  g  P,,{r-k) - J*),  - * 


Using  Stirling's  formula  with  upper  and  lower  bound  (see  e.g.  Feller,  Vol.  I,  page  54), 
we  find  the  expression  above  is  bounded  by 

(1  -  0,)  exp|— (0,  -  0a)nf]e  £  P^{r  =  k)  (^3^)  * 

<  (1  —  02)  exp[— (0t  —  0j)nf]  t  V5* 

So  P™  {r  <  |  +  l}  <  y/2  exp{-n{(0,  -02)f  -  ^(02)]}. 
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The  process  for  bounding  (3)  is  more  or  less  the  same,  although  we  lose  the 
independence  of  ST  -  n(  and  T. 

I  g^m-Amva  -  St)  exp[-(At  -  \3)Sr\dPyJ 

(r<») 

[»[  /IK+1 

<  £  /  9»um-T(l  -  y)  «xp[-(At  -  Aj)yj^,(r  =  A,  STe  dy)/gVo>m(u0m) 

4=1  Jn< 

Ul  /"<+i 

<  exp(— (Aj  -  Aj)nfj  £  /  r(l  -  y)  P*,(T  =  A,  ST  €  dy)/?l,o^.(i/0m). 

4=1  '"f 


From  this  step  on  the  argument  is  the  same  as  above.  Substituting  in  the  maximal 
value  of  gy,m-T  and  using  Stirling’s  formula  carefully,  we  arrive  at 

*£•>  (r  <  <  '/i  wp<-»l(*.  -  i.)f  -  *(*>))}• 

To  complete  the  proof  it  is  sufficient  to  show 
Lemma  2. 


or  equivalently 


max  j(y4  -  02) f  -  ^(y2)J  >  2fa 


Proof  of  Lemma  2.  Using  the  method  of  Lagrange’s  multiplier,  it  is  easy  to 
show  that  (0t  -  02)f  -  i>( 0j)  i*  maximised  at  9%  and  02  satisfying 


^  Jiil  “  » 

#(•«)  =  Wa) 


(4) 


Equation  (4)  involves  a  transcendental  equation  which  is  difficult  to  solve  explic¬ 
itly,  but  here  is  an  easy  way  out.  Dvoretsky,  Kiefer,  and  Wolfowits  (1956)  proved 

P{D:>y^()<Cle-^\ 


Siegmund  (1982)  showed 


P{D~  >  >/n?}  ~  C3(()  «-«(<* -*)-♦(*>! 


where  9X  and  Sj  satisfy(4)  and  Cj(f)  is  a  constant  depending  only  on  f.  These  two 
results  imply 

Bm^  Cx  >  1. 

Suppose  (02  —  0X){  —  <  2f*  for  some  f ,  then 

lim  Cx  e-2n(* /C2(t)  e~nU#,-*,)f-*(#,)1  =  0. 

n— oo 

This  is  a  contradiction.  Consequently  Lemma  2  is  true,  and  the  proof  of  Theorem  1 
is  completed. 


3.  Concluding  Remarks 

(i)  Theorem  1  is  useful  in  determining  confidence  bounds,  cf.  Birnbaum  and 
McCarty  (1958)  and  Csorg6  and  Horvdth  (1981). 

(ii)  It  is  also  possible  to  derive  a  bound  of  the  form  P{D~  >  y/n  f}  <  ?~>  e-2"** 
by  working  on  (2)  only.  This  bound  is  strictly  better  than  Theorem  1  of  f  >  |,  but 
the  result  is  poor  when  f  is  small.  This  is  the  reason  why  we  split  the  set  {r  <  m} 
into  two  parts  and  use  a  different  argument  on  each  part. 

(iii)  Birnbaum  and  Tmgey  (1951)  gave  the  exact  distribution  of  D~,  but  their 
formula  is  inconvenient  for  numerical  calculation.  This  is  one  of  the  reasons  that  a 
bound  like  Theorem  1  may  be  useful  in  application. 

(iv)  At  first  sight,  the  conjecture  mentioned  in  section  one  seems  unlikely  to  be 
true,  when  compared  with  the  asymptotic  result  lim  P{D~  >  f}  =  e~2f>,  but 
Smirnov’s  (1944)  result  P{D~  >  ?}  =  exp[-2f(f  +  (Sri*)"1)]  +  o(n_»),  which  sug¬ 
gests  that  D~  approach  the  asymptotic  distribution  from  below,  served  as  analytical 
support  of  the  conjecture. 

Acknowledgement.  The  author  is  greatly  indebted  to  Professor  David  Sieg- 
mund  for  suggesting  this  problem  and  providing  many  helpful  comments. 
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